12 research outputs found

    Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis

    Full text link
    Adult content detection still poses a great challenge for automation. Existing classifiers primarily focus on distinguishing between erotic and non-erotic texts. However, they often need more nuance in assessing the potential harm. Unfortunately, the content of this nature falls beyond the reach of generative models due to its potentially harmful nature. Ethical restrictions prohibit large language models (LLMs) from analyzing and classifying harmful erotics, let alone generating them to create synthetic datasets for other neural models. In such instances where data is scarce and challenging, a thorough analysis of the structure of such texts rather than a large model may offer a viable solution. Especially given that harmful erotic narratives, despite appearing similar to harmless ones, usually reveal their harmful nature first through contextual information hidden in the non-sexual parts of the narrative. This paper introduces a hybrid neural and rule-based context-aware system that leverages coreference resolution to identify harmful contextual cues in erotic content. Collaborating with professional moderators, we compiled a dataset and developed a classifier capable of distinguishing harmful from non-harmful erotic content. Our hybrid model, tested on Polish text, demonstrates a promising accuracy of 84% and a recall of 80%. Models based on RoBERTa and Longformer without explicit usage of coreference chains achieved significantly weaker results, underscoring the importance of coreference resolution in detecting such nuanced content as harmful erotics. This approach also offers the potential for enhanced visual explainability, supporting moderators in evaluating predictions and taking necessary actions to address harmful content.Comment: Accepted for 6th Workshop on Computational Models of Reference, Anaphora and Coreference at EMNLP 2023 Conferenc

    Polish Ashbery back in America?

    Get PDF
    John Ashbery’s poetry was introduced to Polish readers for the fi rst time in 1986, in the now legendary "blue" issue of the world literature review "Literatura na Świecie". Although the American poet spoke vicariously through the voice of Piotr Sommer’s translatory ventriloquism, and later also via other translators’ voices, it is Andrzej Sosnowski who is typically perceived as the ambassador of Ashbery in Poland. He has been popularizing Ashbery’s poetry by translations and criticism, and - what is more important - also by publishing his own poetry, which has grown through Ashberian verse. Still, easy parallels should be avoided, especially if one considers translations of Sosnowski’s poetry into English. A discussion of a rendition by Rod Mengham, an American poet who translates in cooperation with the author, raises interesting questions of the limits and hermeneutics of translation and mediation of authorial intention, form and language. Can this poetry, so deeply rooted in the American tradition, resist translation into English? This article attempts to answer some of the questions by studying the processes and (im)possibilities of translation of Sosnowski’s poetry

    The Grammar and Syntax Based Corpus Analysis Tool For The Ukrainian Language

    Full text link
    This paper provides an overview of a text mining tool the StyloMetrix developed initially for the Polish language and further extended for English and recently for Ukrainian. The StyloMetrix is built upon various metrics crafted manually by computational linguists and researchers from literary studies to analyze grammatical, stylistic, and syntactic patterns. The idea of constructing the statistical evaluation of syntactic and grammar features is straightforward and familiar for the languages like English, Spanish, German, and others; it is yet to be developed for low-resource languages like Ukrainian. We describe the StyloMetrix pipeline and provide some experiments with this tool for the text classification task. We also describe our package's main limitations and the metrics' evaluation procedure

    Polski Ashbery w Ameryce

    Get PDF
    Polish Ashbery Back in America?John Ashbery’s poetry was introduced to Polish readers for the fi rst time in 1986, in the now legendary “blue” issue of the world literature review “Literatura na Świecie”. Although the American poet spoke vicariously through the voice of Piotr Sommer’s translatory ventriloquism, and later also via other translators’ voices, it is Andrzej Sosnowski who is typically perceived as the ambassador of Ashbery in Poland. He has been popularizing Ashbery’s poetry by translations and criticism, and – what is more important – also by publishing his own poetry, which has grown through Ashberian verse. Still, easy parallels should be avoided, especially if one considers translations of Sosnowski’s poetry into English. A discussion of a rendition by Rod Mengham, an American poet who translates in cooperation with the author, raises interesting questions of the limits and hermeneutics of translation and mediation of authorial intention, form and language. Can this poetry, so deeply rooted in the American tradition, resist translation into English? This article attempts to answer some of the questions by studying the processes and (im)possibilities of translation of Sosnowski’s poetry

    StyloMetrix: An Open-Source Multilingual Tool for Representing Stylometric Vectors

    Full text link
    This work aims to provide an overview on the open-source multilanguage tool called StyloMetrix. It offers stylometric text representations that cover various aspects of grammar, syntax and lexicon. StyloMetrix covers four languages: Polish as the primary language, English, Ukrainian and Russian. The normalized output of each feature can become a fruitful course for machine learning models and a valuable addition to the embeddings layer for any deep learning algorithm. We strive to provide a concise, but exhaustive overview on the application of the StyloMetrix vectors as well as explain the sets of the developed linguistic features. The experiments have shown promising results in supervised content classification with simple algorithms as Random Forest Classifier, Voting Classifier, Logistic Regression and others. The deep learning assessments have unveiled the usefulness of the StyloMetrix vectors at enhancing an embedding layer extracted from Transformer architectures. The StyloMetrix has proven itself to be a formidable source for the machine learning and deep learning algorithms to execute different classification tasks.Comment: 26 pages, 6 figures, pre-print for the conferenc

    BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

    Full text link
    Advances in automated detection of offensive language online, including hate speech and cyberbullying, require improved access to publicly available datasets comprising social media content. In this paper, we introduce BAN-PL, the first open dataset in the Polish language that encompasses texts flagged as harmful and subsequently removed by professional moderators. The dataset encompasses a total of 691,662 pieces of content from a popular social networking service, Wykop.pl, often referred to as the "Polish Reddit", including both posts and comments, and is evenly distributed into two distinct classes: "harmful" and "neutral". We provide a comprehensive description of the data collection and preprocessing procedures, as well as highlight the linguistic specificity of the data. The BAN-PL dataset, along with advanced preprocessing scripts for, i.a., unmasking profanities, will be publicly available

    Translator toward the Other: tropes and signatures

    No full text
    Wydział Filologii Polskiej i Klasycznej: Zakład Teorii Literatury, Literatury XX Wieku i Sztuki PrzekładuPraca stanowi próbę zaproponowania nowego modelu krytyki i analizy przekładu. Metoda ta opiera się na tropologicznej teorii Douglasa Robinsona (opisanej w jego książce Translator’s Turn). Została ona poddana jednak daleko idącym modyfikacjom i rozszerzeniom. Pięć tropów (ironia, metonimia, synekdocha, hiperbola i metalepsis) odpowiada pięciu typom tłumaczy, a typy te charakteryzuje afektywna motywacja ich translatorskich działań, skierowana wobec Innego przekładu. Oprócz propozycji teoretycznej, praca zawiera analizy i interpretacje konkretnych przykładów tłumaczy Andrzeja Sosnowskiego, Roda Menghama, Doreen Daume, Jakuba Ekiera, grupy VERSATORIUM oraz Jana Gondowicza. Materiał przekładowy to natomiast poezja Andrzeja Sosnowskiego, Johna Ashbery’ego, Tkaczyszyna-Dyckiego, Charlesa Bernsteina, Reinera Kunze oraz proza Borysa Akunina.An effort to offer a new critical model of translation analysis has been made in this dissertation. The offered method is based on translation tropics – and idea by Douglas Robinson presented in his book Translator’s Turn – but it has been vastly modified and extended. Five tropes (irony, metonymy, synecdoche, hyperbole and metalepsis) describe five types of translators and their affective motivations while translational decision making: translator’s affects toward the Other of the source text and culture like Derridian “hostipitality” or the “abject” introduced by Butler. Beside the theoretical part there are also interpretations and analysis of particular translational cases to be found. They are based on works by Andrzej Sosnowski, John Ashbery, Tkaczyszyn-Dycki, Charles Bernstein, Reiner Kunze and Boris Akunin and follow the affective motivations of following translators: Rod Mengham, Doreen Daume, Jakub Ekier, VERSATORIUM and Jan Gondowicz

    GAN and GPT-2 neural networks, worn words and creativity, namely literary second-hand

    No full text
    Czy kreatywność to wyłącznie domena człowieka? Czy sieć neuronowa, choćby najbardziej skomplikowanej architektury, nakarmiona materiałem stworzonym i wybranym przez człowieka może być kreatywna, a jeśli nawet, to czy jej dzieło nie będzie wobec ludzkiego wtórne? A może, jak chciał Bachtin, a za nim Kristeva, każda nasza wypowiedź i tak jest skazana na wtórność, bo taka jest natura języka? Czym jest kreatywność, co potrafi sztuczna inteligencja, do jakich refleksji krytycznoliterackich skłaniać może jej twórczość, szczególnie w kontekście relacji intertekstualnych, interpoetyckich? W artykule odpowiedzi szukam na przykładzie funkcjonowania sieci neuronowych typu GAN oraz modelu GPT-2. Oprócz fragmentów analizowanych tekstów i nawiązań do teorii literatury pojawia się również wprowadzenie do struktury i istoty omawianych rozwiązań technologicznych.Is creativity only a human domain? Can a neural network, even the most sophisticated architecture, fed with material created and chosen by man, be creative, and even if it is not a work of art secondary to human beings? Or maybe, as Bakhtin, and behind him Kristeva, wanted, each of our expressions is still destined to be secondary, because this is the nature of language? What is creativity, what can artificial intelligence do, what critical literary reflections can its work induce, especially in the context of intertextual and interpoetic relations? In the article I am searching for answers on the example of functioning of neural networks type GAN and GPT-2 model. Apart from fragments of analyzed texts and references to the theory of literature, there is also an introduction to the structure and essence of the analyzed technological solution

    From Intersemiotic Translation to Tie-In Products, or Transmedial Storytelling as a Translation Strategy

    No full text
    Na przykładach literackich, filmowych, muzycznych a także tych wymykających się jednoznacznym gatunkowym klasyfikacjom w artykule przedstawione zostały translatorskie serie, rozumiane jako ciąg utworów (lub wręcz produktów) interpretujących oryginał (lub siebie nawzajem) za pomocą innych mediów. Tradycyjne ujęcie przywołujące przekład intersemiotyczny w przypadku każdego niejęzykowego przekładu zostało tu jednak wzbogacone o marketingowo-rynkowe motywacje takich translatorskich działań, zwane transmedialnym storytellingiem oraz strategią sprzedaży produktu tie-in.Using examples from literature, film, music, and some that elude unambiguous genre classification, the article presents translation series understood as groups of works (or rather products) interpreting an original work (or each other) using other media. The traditional notion of invoking intersemiotic translation in cases of non-linguistic translation is here enhanced to include the marketing and capitalistic motivations for a variety of translation operations that fall under the rubrics of transmedial storytelling and tie-in strategies for selling products

    Far Beyond Google Translate: Natural Language Processing (NLP) in Translation and Translatology

    No full text
    Przewrotna jest rola postępu – im więcej technologicznego rozwoju, tym większy udział człowieka – w koncepcji, formułowaniu zadań, interpretacji wyników, nadzorze i korekcie. Hierarchia jest zachowana, człowiek wciąż nieodzowny, ale to nie znaczy, że w pewnych obszarach maszynowy potencjał rzeczywiście nie przewyższa ludzkiego i że nie warto z tej przewagi skorzystać. Przetwarzanie języka naturalnego (NLP) to dziedzina niemłoda, ale w ostatnich latach dzięki rozkwitowi metod uczenia głębokiego (deep learning), mody na maszynowe wnioskowanie (data/knowledge mining) czy nowym sprzętowym interfejsom (m.in. zaawansowane rozpoznawanie obrazu) komputerowa analiza tekstu przeżywa istny renesans. W odniesieniu do translacji przyjęło się mówić i pisać głównie o coraz doskonalszych lub właśnie zupełnie niemożliwych algorytmach dla kolejnych par języków czy coraz większej precyzji samego tłumaczenia. Niniejszy artykuł przedstawia natomiast nieco szersze spektrum procesu tłumaczenia i przygląda się elementom przekładowi towarzyszącym (jak choćby krytyka), w których wykorzystanie metod NLP możeprzynieść nowe, ciekawe wyniki. Wyniki, których ze względu na ograniczoną moc obliczeniową człowiek nie jest w stanie osiągnąć. Omówione zostały takie aspekty jak wektorowa reprezentacja języka, stylometria i jej zastosowania czy analiza wielkich zbiorów danych – wszystko to na potrzeby szeroko rozumianychtranslacji i translatologii.The more technological development, the greater the participation of the human – in formulating tasks and problems, supervising and improving automated processes and interpreting their outcomes. The hierarchy is preserved, humans are still indispensable, but it does not mean that in certain areas of machinery the potential does not really exceed that of the human and that this advantage is not worth exploiting. Natural language processing (NLP) is not a young field, but in recent years, thanks to the thrive of deep learning methods, data and knowledge mining or new human-machine interfaces, computer text analysis is experiencing a real renaissance. As far as translation is concerned, it is mostly algorithms for machine translation that are being discussed. This article, on the other hand, presents a slightly broader spectrum of the translation process and looks at the accompanying elements (such as criticism) in which the use of NLP methods may bring new and interesting results. Results which, due to limited computing power, humans are unable to achieve. The discussion in the paper covers such aspects as the vector representation of language,stylometry and its application, or the analysis of large data sets – all for the purposes of translation and translatology
    corecore